281 research outputs found

    Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes

    Get PDF
    BACKGROUND: The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods. RESULTS: In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85%) were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis. CONCLUSION: Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and microarray technologies. Predictive models generated by this approach are better validated than those generated on a single data set, while showing high predictive power and improved generalization performance

    Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis.</p> <p>Results</p> <p>To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer.</p> <p>Conclusions</p> <p>Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets.</p

    New Insights into the Genetic Regulation of Plasmodium Falciparum Obtained by Bayesian Modeling

    Get PDF
    The most fatal and prevalent form of malaria is caused by the bloodborne pathogen Plasmodium falciparum (henceforth P.f). Annually, approximately three million people died of malaria. Despite P.f devastivating effect globally, the vast majority of its proteins have not been characterized experimentally. In this work, we provide computational insight that explore the modalities of the regulation for some important group of genes of P.f, namely components of the glycolytic pathway, and those involved in apicoplast metabolism. Glycolysis is a crucial pathway in the maintenance of the parasite while the recently discovered apicoplast contains a range of metabolic pathways and housekeeping processes that differ radically to those of the host, which makes it ideal for drug therapy

    Explainable artificial intelligence for omics data: a systematic mapping study

    Get PDF
    Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies’ AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners

    A scoping review of distributed ledger technology in genomics: thematic analysis and directions for future research

    Get PDF
    Objective Rising interests in distributed ledger technology (DLT) and genomics have sparked various interdisciplinary research streams with a proliferating number of scattered publications investigating the application of DLT in genomics. This review aims to uncover the current state of research on DLT in genomics, in terms of focal research themes and directions for future research. Materials and Methods We conducted a scoping review and thematic analysis. To identify the 60 relevant papers, we queried Scopus, Web of Science, PubMed, ACM Digital Library, IEEE Xplore, arXiv, and BiorXiv. Results Our analysis resulted in 7 focal themes on DLT in genomics discussed in literature, namely: (1) Data economy and sharing; (2) Data management; (3) Data protection; (4) Data storage; (5) Decentralized data analysis; (6) Proof of useful work; and (7) Ethical, legal, and social implications. Discussion Based on the identified themes, we present 7 future research directions: (1) Investigate opportunities for the application of DLT concepts other than Blockchain; (2) Explore people’s attitudes and behaviors regarding the commodification of genetic data through DLT-based genetic data markets; (3) Examine opportunities for joint consent management via DLT; (4) Investigate and evaluate data storage models appropriate for DLT; (5) Research the regulation-compliant use of DLT in healthcare information systems; (6) Investigate alternative consensus mechanisms based on Proof of Useful Work; and (7) Explore DLT-enabled approaches for the protection of genetic data ensuring user privacy. Conclusion While research on DLT in genomics is currently growing, there are many unresolved problems. This literature review outlines extant research and provides future directions for researchers and practitioners

    A scoping review of distributed ledger technology in genomics: thematic analysis and directions for future research

    Get PDF
    OBJECTIVE: Rising interests in distributed ledger technology (DLT) and genomics have sparked various interdisciplinary research streams with a proliferating number of scattered publications investigating the application of DLT in genomics. This review aims to uncover the current state of research on DLT in genomics, in terms of focal research themes and directions for future research. MATERIALS AND METHODS: We conducted a scoping review and thematic analysis. To identify the 60 relevant papers, we queried Scopus, Web of Science, PubMed, ACM Digital Library, IEEE Xplore, arXiv, and BiorXiv. RESULTS: Our analysis resulted in 7 focal themes on DLT in genomics discussed in literature, namely: (1) Data economy and sharing; (2) Data management; (3) Data protection; (4) Data storage; (5) Decentralized data analysis; (6) Proof of useful work; and (7) Ethical, legal, and social implications. DISCUSSION: Based on the identified themes, we present 7 future research directions: (1) Investigate opportunities for the application of DLT concepts other than Blockchain; (2) Explore people’s attitudes and behaviors regarding the commodification of genetic data through DLT-based genetic data markets; (3) Examine opportunities for joint consent management via DLT; (4) Investigate and evaluate data storage models appropriate for DLT; (5) Research the regulation-compliant use of DLT in healthcare information systems; (6) Investigate alternative consensus mechanisms based on Proof of Useful Work; and (7) Explore DLT-enabled approaches for the protection of genetic data ensuring user privacy. CONCLUSION: While research on DLT in genomics is currently growing, there are many unresolved problems. This literature review outlines extant research and provides future directions for researchers and practitioners

    Promiscuous gene expression in thymic epithelial cells is regulated at multiple levels

    Get PDF
    The role of central tolerance induction has recently been revised after the discovery of promiscuous expression of tissue-restricted self-antigens in the thymus. The extent of tissue representation afforded by this mechanism and its cellular and molecular regulation are barely defined. Here we show that medullary thymic epithelial cells (mTECs) are specialized to express a highly diverse set of genes representing essentially all tissues of the body. Most, but not all, of these genes are induced in functionally mature CD80hi mTECs. Although the autoimmune regulator (Aire) is responsible for inducing a large portion of this gene pool, numerous tissue-restricted genes are also up-regulated in mature mTECs in the absence of Aire. Promiscuously expressed genes tend to colocalize in clusters in the genome. Analysis of a particular gene locus revealed expression of clustered genes to be contiguous within such a cluster and to encompass both Aire-dependent and –independent genes. A role for epigenetic regulation is furthermore implied by the selective loss of imprinting of the insulin-like growth factor 2 gene in mTECs. Our data document a remarkable cellular and molecular specialization of the thymic stroma in order to mimic the transcriptome of multiple peripheral tissues and, thus, maximize the scope of central self-tolerance

    Microarray-based approach identifies microRNAs and their target functional patterns in polycystic kidney disease

    Get PDF
    Background: MicroRNAs (miRNAs) play key roles in mammalian gene expression and several cellular processes, including differentiation, development, apoptosis and cancer pathomechanisms. Recently the biological importance of primary cilia has been recognized in a number of human genetic diseases. Numerous disorders are related to cilia dysfunction, including polycystic kidney disease (PKD). Although involvement of certain genes and transcriptional networks in PKD development has been shown, not much is known how they are regulated molecularly. Results: Given the emerging role of miRNAs in gene expression, we explored the possibilities of miRNA-based regulations in PKD. Here, we analyzed the simultaneous expression changes of miRNAs and mRNAs by microarrays. 935 genes, classified into 24 functional categories, were differentially regulated between PKD and control animals. In parallel, 30 miRNAs were differentially regulated in PKD rats: our results suggest that several miRNAs might be involved in regulating genetic switches in PKD. Furthermore, we describe some newly detected miRNAs, miR-31 and miR-217, in the kidney which have not been reported previously. We determine functionally related gene sets, or pathways to reveal the functional correlation between differentially expressed mRNAs and miRNAs. Conclusion: We find that the functional patterns of predicted miRNA targets and differentially expressed mRNAs are similar. Our results suggest an important role of miRNAs in specific pathways underlying PKD
    corecore